All Questions

1 question

1vote

0answers

57views

Can I reduce computation by only predicting response tokens in a transformer and still get the same gradients?

I have been looking at the source code of the Stanford Alpaca model and I believe that during inference, the whole instruction + response data is fed into the model normally. Then the instruction part ...

Tianchen Zheng

asked Mar 29, 2023 at 4:27

Featured on Meta
Evolving comments: An experiment to encourage engagement and follow-up questions
Updates to advertising guidelines
Upcoming initiatives on Stack Overflow and across the Stack Exchange network...

Hot Network Questions

Could OLS miss statistically significant coefficient because it's too small?
Are FPV drones responsible for 75% of casualties on Russian side in Russia-Ukraine war?
Physics equations with universal meaning possible?
Can multiple creatures use a Legendary Action at the end of a single turn?
What is tunneling? What does it mean to carry a service not normally provided by the network?
What is ל"ו נימא
Siunitx formatting of dynamically loaded numbers
Does Exodus 31:16 teach that the old covenant is eternal?
A single word for dishonestly underselling one's own importance/credentials?
Is Backpropagation faulty?
Result in opposite direction to hypothesis - one or two-tailed p-value?
Slow SQL query with nested subquery
How does SQL Server maintain rowcount metadata?
Idiomatic way of generating a unique filename?
Multiday hike in UK with children
Boids by Simon Woods: can we re-invent further efficient flocking models to forge complexity from simple rules?
Revising part of a manuscript not covered by the referee report
Confusion about conjugation and verb versus adjective versions of the same word
What is this 3-pole LED striplight mains connector?
Estimating topological errors of a CCZ factory
Whois Query to .app TLD gives "getaddrinfo(whois.nic.app): Name or service not known"
Can't get confirmation for my flight
Is Vicente Valtieri's depiction in Oblivion Remastered consistent with The Elder Scrolls lore regarding Morrowind vampires?
I'm owed money from a non-profit for services rendered, but they are unresponsive

All Questions

Can I reduce computation by only predicting response tokens in a transformer and still get the same gradients?

Related Tags

Hot Network Questions